In this report, we are going to make wordcloud plots for the philosophy dataset and explore the patterns of words for different philosophers from the same school and for books written by the same philosopher.

Data Import and Cleaning

Data quality is the most important part when preparing the data. Firstly, we will check if there is any missing value and duplicated value in the data as well as exploring the types of each column.

As it is shown above, there is no missing data or duplicated data in the dataset. There are 360808 records and 11 entries in total.

Exploratory Data Aanlysis

First, take a look at the original dates of the authors to publish their books and the emerge year of each school.

We can see Plato and Aristotle published their books before the Christ.

Now, we are curious about how many authors are there in each school and who many books did each author write.

We can see the specific authors in each school from the table below:

It is shown in the plot that Analytic has 7 authors which is the most, while Aristotle, Nietzsche and Plato only have 1 author in each school.

Then we can see that amoung all the authors, Nietzsche wrote the most books which is 5, and the most authors wrote only 1 book. We can also see the sentence length that each author like to use in their books:

We can see that Descartes likes using long sentences the most, as the mean length of his sentences is up to 248 words. Among all the authors, Wittgenstein likes using short sentences the most, as the mean length of his sentences is only 85.

We can also see the sentence length of each schoo. Capitalism books have the longest sentences which is 188 words on average, Empiricism and German Idealism also have sentences with more than 180 words on average. Nietzsche and Plato books have the shortest sentences which are 116 and 114 words on average.

Wordcloud Analysis

As we have gain an overall understanding of the data, now we can make wordclouds to see the word patterns in each book.

Sometimes we are curious about whether all the authors in the same school are using similar words in their books. To analysis this, we can take Empiricism as an example and draw a wordcloud plot for 3 writers from this school:

We can see that for Locke, the main words are idea, mind, name, understanding, men, general, substance, nature, reason, knowledge, etc. For Hume, the main words are object, nature, reasoning, mind, passion, action, imagination, relation, manner, impression, effect, order, cause, etc. For Berkeley, the main words are perceived, spirit, sensation, extension, mind, sense, principle, etc.

There are several words that are commonly used by the 3 authors, like idea, nature and mind. However most words they use are pretty different. Therefore, we cannot say that these 3 authors that come from the same school share a lot of similarity in terms of their language using.

Apart from authors from the same school, we can also explore the books that are written by the same author. Take Nietzsche as an example, he wrote 5 books in total from 1886-1888. Making wordcloud plot for each book can show us the high frequency words he uses in each book:

In the first book Beyond Good and Evil (1886), we can see the words he uses the most are german, soul, moral, hither, man, life, nature, europe, one, etc.

In the second book Thus Speak Zarathustra (1887) the high frequency words are zarathustra, one, men, spirit, people, truth, etc. In this book there are not a lot of high frequency importent words.

In the third book The Antichrist (1888) the high frequency words are christianity, church, nature, god, world, life, instinct, feeling, reality, jew, concept, life etc.

In the fourth book Ecce Homo (1888) the high frequency words are german, zarathustra, wagner, man, people, life, culture, truth, whold, people, value, morality, etc.

In Twilight of the Idols (1888) the high frequency words are life, value, world, morality, christian, even, nature, german, value, christianity, virtue, energy, truth, instinct, etc.

We can see that althouth there are some commonly used words in these books like german, christian, morality and christian, the majority of the words using are still different. Especially for the 3 books published in the same year (1888), there is a big difference between the words he used. Therefore, it is very hard to say that for a particular writer, the words he uses are mostly the same.